-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-edit capabilities to Speech Editing #94
base: master
Are you sure you want to change the base?
Conversation
Thanks! Really helpful contribution!
|
I meant as in multiple types of edits. If I try to do a deletion and a substitution at the same time, for example: original: "But when I had approached so near" The inference fails with the following error (I'll edit once it finishes running again) However if I want multiple different insertions or deletions or substitutions, everything will just work as long as I don't mix and match. |
I see the reason it fails is probably because I used |
I see, I am doing more testing right now and I think you're right, supplying multiple different types of edits seems to work as long as there is a sizeable gap between them. So doing something like this on words right next to each other can only work if the margin size is small enough? Not sure if this is something I can fix - do you have any suggestions? I can probably just throw an error instead and suggest they lower the margin, along with when editing the very last word like you mentioned. |
regarding the issue of spans being two close: Both approaches are sensible to me. |
If you want to do large scale testing https://github.com/jasonppy/VoiceCraft/blob/master/RealEdit.txt contains 310 speech editing examples, and there are 40 2-span edits examples. to interpret the example:
where |
Are there any drawbacks to lowering the margin I should be aware of? The cases where my algorithm breaks don't break if I lower it to 0.02secs, so this should be an easy solution. I can constantly lower the margin until the spans align properly to make sure it works in all cases. orig: But when I had |
The only drawback is that the forced alignment might not be perfect, and a larger margin gives room for such a mistake, also a large margin ensure modification of the neighboring (but not changing) words to have a smooth transition next to the changing words. Therefore default it at 0.02sec wouldn't be great |
I used the margin fix. Regenerates the |
Hi I'm interested in testing multi-span editing algorithm. |
@jasonppy should be ready to merge. The example original and target transcripts uses a pretty complex set of changes just to show what is now possible |
The algorithm seems to work from my testing. |
This pull request implements a heavily modified edit distance algorithm to handle doing multiple edits at the same time.
It also gets rid of the need for the user to specify the edit type(s), everything is handled automatically.
Known issues:
Like the previous implementation, edits to the last index of the input sentence do not work. This looks like an issue of the model's inference, as in both my and the original implementation these changes are simply not recognized.
Furthermore, multiple edit types cannot happen at the same time. For example, mix and matching substitutions with insertions crashes in inference. This is again something I need to look into still. Is this a limitation of the model itself?
I'd appreciate some help testing any other edge cases in the speech editing jupyter notebook if anyone is interested - I believe I have them all covered but more testing can't hurt :)
I will update the Google Colab for speech editing once this is merged.